A model for the evolution of reinforcement learning in fluctuating games
نویسندگان
چکیده
Many species are able to learn to associate behaviors with rewards as this gives fitness advantages in changing environments. Social interactions between population members may, however, require more cognitive abilities than simple trial-and-error learning, in particular the capacity to make accurate hypotheses about the material payoff consequences of alternative action combinations. It is unclear in this context whether natural selection necessarily favors individuals to use information about payoffs associated to non-tried actions (hypothetical payoffs), as opposed to simple reinforcement of realized payoff. Here, we develop an evolutionary model where individuals are genetically determined to use either trial-and-error learning or learning based on hypothetical reinforcements, and ask what is the evolutionarily stable learning rule under pairwise symmetric two-action stochastic repeated games played during lifespan. We analyze through stochastic approximation theory and simulations the learning dynamics on the behavioral timescale, and derive conditions where trial-and-error learning outcompetes hypothetical reinforcement learning on the evolutionary timescale. This occurs in particular under repeated cooperative interactions with the same partner. By contrast, we find that hypothetical reinforcement learners tend to be favored under random interactions, but stable polymorphisms can also obtain where trial-and-error learners are maintained at a low frequency. We conclude that specific game structures can select for trial-and-error learning even in the absence of costs of cognition, which illustrates that cost-free increased cognition can be counterselected under social interactions.
منابع مشابه
Relationship between Designing Computer-based Educational Games, and Learning Motivation among Elementary Students
Background: Motivation is an important factor in learning. Educational games increase the learning motivation and understanding of students by creating a sense of joy, satisfaction and involvement. However, it is necessary to incorporate learning elements into the games, differently. In this study, the researcher tried to provide a model for designing educational games and determining its relat...
متن کاملThe Evolution of Coding in Signaling Games
Signaling games with reinforcement learning have been used to model the evolution of term languages (Lewis 1969, Convention. Cambridge, MA: Harvard University Press; Skyrms 2006, “Signals” Presidential Address. Philosophy of Science Association for PSA). In this article, syntactic games, extensions of David Lewis’s original sender– receiver game, are used to illustrate how a language that explo...
متن کاملReinforcement learning based feedback control of tumor growth by limiting maximum chemo-drug dose using fuzzy logic
In this paper, a model-free reinforcement learning-based controller is designed to extract a treatment protocol because the design of a model-based controller is complex due to the highly nonlinear dynamics of cancer. The Q-learning algorithm is used to develop an optimal controller for cancer chemotherapy drug dosing. In the Q-learning algorithm, each entry of the Q-table is updated using data...
متن کاملUtilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs
Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...
متن کاملOperation Scheduling of MGs Based on Deep Reinforcement Learning Algorithm
: In this paper, the operation scheduling of Microgrids (MGs), including Distributed Energy Resources (DERs) and Energy Storage Systems (ESSs), is proposed using a Deep Reinforcement Learning (DRL) based approach. Due to the dynamic characteristic of the problem, it firstly is formulated as a Markov Decision Process (MDP). Next, Deep Deterministic Policy Gradient (DDPG) algorithm is presented t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015